Protein Science — Latest Matching Preprints

1

StabilizeIT: An Automated Workflow for Protein Stabilization

Kutnowski, N.; Budic, Y.; Alon, N.; Chalik, M.; Levin, I.; Lapidoth, G.; Zimmerman, L.

2025-10-10 bioinformatics 10.1101/2025.10.09.681370 medRxiv

Top 0.1%

59.5%

Show abstract

The industrial application of enzymes is often hampered by poor stability and low expression yields. While computational tools can predict stabilizing mutations, many are bound by restrictive licenses that hinder their broader adoption. To address this, we developed StabilizeIT, a powerful, open-access webserver for enhancing protein stability and expression. StabilizeIT integrates a pipeline of curated open-source tools such as ProteinMPNN, AlphaFold2 and SaProt with our state-of-the-art model, SolvIT, which accurately predicts heterologous expression titers in E. coli. This unique combination allows for the simultaneous optimization of melting temperature (Tm) and solubility. The pipeline exhibits remarkable speed, generating dozens of high-quality candidates with predicted high titers and increased stability in under an hour, streamlining the path to experimental validation. To demonstrate its efficacy, StabilizeIT was used to engineer multiple enzymes in our novel biosynthetic pathway for Hyaluronic Acid. The resulting variants showed greatly enhanced thermal stability and expression, proving the pipelines real-world utility. StabilizeIT is now available to the community, offering an accessible and validated solution to accelerate the development of robust proteins for diverse applications. The webserver is freely available at https://stabilizeit.enzymit.com

2

Global analysis of thermal and chemical denaturation using CheMelt: Thermodynamic dissection of highly thermostable de novo designed proteins

Lampinen, V.; Burastero, O.; Guazzelli, I. P.; Vogele, F.; Pinheiro, F.; Nowak, J. S.; Garcia Alai, M. M.; Kjaergaard, M.

2026-04-09 biophysics 10.64898/2026.04.07.716910 medRxiv

Top 0.1%

56.3%

Show abstract

De novo protein design often produces thermostable proteins that denature above 100 {degrees}C, which complicates the analysis of their stability. Thermostable proteins can be unfolded by combined chemical and thermal denaturation followed by global analysis of multiple melting curves. Here, we have developed CheMelt, a new online tool for global analysis of unfolding data via an intuitive graphical user interface. We use nanoscale differential scanning fluorimetry followed by CheMelt data analysis to dissect the combined thermal and chemical denaturation of thirty-five de novo designed protein binders. Fifteen present sufficient fluorescence changes to extract thermodynamic parameters of unfolding. These de novo designed proteins have systematically lower {Delta}Cp and m-values than comparable natural proteins, which implies that they expose fewer hydrophobic residues upon unfolding. We show that a high thermostability of a designed protein does not necessarily imply a high equilibrium stability; and demonstrate the potential of CheMelt in dissecting thermodynamic properties for protein design and engineering.

3

Categorizing prediction modes within low-pLDDT regions of AlphaFold2 structures

Williams, C. J.; Chen, V. B.; Richardson, D. C.; Richardson, J. S.

2025-06-07 biochemistry Community evaluation 10.1101/2025.06.06.658382 medRxiv

Top 0.1%

51.0%

Show abstract

AlphaFold2 protein structure predictions are widely available for structural biology uses. These predictions, especially for eukaryotic proteins, frequently contain extensive regions predicted below the pLDDT 70 level, the rule-of-thumb cutoff for high confidence. This work identifies major modes of behavior within low-pLDDT regions through a survey of human proteome predictions provided by the AlphaFold Protein Structure Database. The near-predictive mode resembles folded protein and can be a nearly accurate prediction. Barbed wire is extremely unproteinlike, being recognized by wide looping coils, an absence of packing contacts, and numerous signature validation outliers, and it likely represents a nonpredicted region. Pseudostructure presents an intermediate behavior with a misleading appearance of isolated and badly formed secondary structure-like elements. These prediction modes are compared with annotations of disorder from MobiDB, showing general correlation between barbed wire/pseudostructure and many measures of disorder, an association between pseudostructure and signal peptides, and an association between near-predictive and regions of conditional folding. To enable users to identify these regions within a prediction, a new Phenix tool is developed encompassing the results of this work, including prediction annotation, visual markup, and residue selection based on these prediction modes. This tool will help users develop expertise in interpreting difficult AlphaFold predictions and identify the near-predictive regions that can aid in molecular replacement when a prediction does not contain enough high-pLDDT regions.

4

Three Essential Resources to Improve Differential Scanning Fluorimetry (DSF) Experiments

Wu, T.; Yu, J.; Gale-Day, Z.; Woo, A.; Suresh, A.; Hornsby, M.; Gestwicki, J. E.

2020-03-25 biochemistry 10.1101/2020.03.22.002543 medRxiv

Top 0.1%

49.5%

Show abstract

Differential Scanning Fluorimetry (DSF) is a method that enables rapid determination of a proteins apparent melting temperature (Tma). Owing to its high throughput, DSF has found widespread application in fields ranging from structural biology to chemical screening. Yet DSF has developed two opposing reputations: one as an indispensable laboratory tool to probe protein stability, another as a frustrating platform that often fails. Here, we aim to reconcile these disparate reputations and help users perform more successful DSF experiments with three resources: an updated, interactive theoretical framework, practical tips, and online data analysis. We anticipate that these resources, made available online at DSFworld (https://gestwickilab.shinyapps.io/dsfworld/), will broaden the utility of DSF.

5

De novo design of triosephosphate isomerases using generative language models

Romero-Romero, S.; Braun, A. E.; Kossendey, T.; Ferruz, N.; Schmidt, S.; Höcker, B.

2024-11-10 biochemistry 10.1101/2024.11.10.622869 medRxiv

Top 0.1%

45.7%

Show abstract

The design of proteins with tailored functions is of immense interest to biotechnology, medicine, and the chemical industry. While protein design is rapidly evolving with the use of AI techniques, the design of complex enzymes remains a challenge. Here, we present the use of two large language models (LLMs), ZymCTRL and ProtGPT2, for the generation of de novo enzymes that catalyze the triosephosphate isomerase (TIM) reaction. Natural TIM enzymes are obligatory oligomers that catalyze a multi-step isomerization reaction near the diffusion limit. This makes TIM an ideal target to assess the generative ability of protein language models. Newly generated sequences were filtered to obtain a set of twelve candidates from each approach for experimental validation. Multiple constructs from both language models exhibit the intended function in vivo through their ability to complement a TIM-deficient E. coli strain. In-depth characterization of the best-behaving artificial enzyme reveals behavior and catalytic efficiency close to its natural counterparts. These findings support the use of conditional and fine-tuned unconditional LLMs for the generation of complex enzymes.

6

Sequence-encoded differences in the conformational ensembles of CITED transcriptional activation domains impact coactivator binding

Do, T. U.; Kraft, E. J.; Chappell, G. F.; Parnham, S.; Berlow, R. B.

2026-01-21 biophysics 10.64898/2026.01.20.700670 medRxiv

Top 0.1%

45.2%

Show abstract

Recent advances in predicting and modeling conformational ensembles of intrinsically disordered proteins (IDPs) have provided much needed insights into sequence-ensemble relationships. It is thought that conservation of physicochemical properties, but not the exact identity or order of the amino acids, maintains IDP ensemble properties that are crucial for function. However, detailed experimental studies are still required to fully understand the relationships between sequence and function in IDPs. The human CITED proteins, which are fully disordered transcriptional regulators, share conserved C-terminal transactivation domains (CTADs) that interact with the TAZ1 domain of the transcriptional coactivators CBP/p300. The conserved CTADs harbor amino acid substitutions in regions that are known to be important for interactions of CITED2 with TAZ1, but the effects of these substitutions on TAZ1 binding for the other CITED proteins are unknown. Here, we use solution NMR spectroscopy, circular dichroism, and surface plasmon resonance to characterize the conformational ensembles, dynamics, and interactions of the CITED CTADs. The CTADs are disordered in isolation, although the CITED2 CTAD uniquely displays residual helical structure that is sensitive to ionic strength and protein concentration. In contrast, the CITED1 and CITED4 CTADs remain largely disordered and exhibit more uniform dynamics. Quantitative binding measurements reveal differences in thermodynamics and kinetics for the CTADs interactions with TAZ1, with CITED2 binding most tightly and CITED4 exhibiting significantly weaker affinity. Our results highlight the sensitivity of IDP conformational ensembles to minor sequence changes and the impacts that changes in IDP structures and dynamics can have on biological functions.

7

Protein stability is determined by single-site bias rather than pairwise covariance

Sternke, M.; Tripp, K. W.; Barrick, D.

2025-01-14 biophysics 10.1101/2025.01.09.632118 medRxiv

Top 0.1%

44.9%

Show abstract

The biases revealed in protein sequence alignments have been shown to provide information related to protein structure, stability, and function. For example, sequence biases at individual positions can be used to design consensus proteins that are often more stable than naturally occurring counterparts. Likewise, correlations between pairs of residue can be used to predict protein structures. Recent work using Potts models show that together, single-site biases and pair correlations lead to improved predictions of protein fitness, activity, and stability. Here we use a Potts model to design groups of protein sequences with different amounts of single-site biases and pair correlations, and determine the thermodynamic stabilities of a representative set of sequences from each group. Surprisingly, sequences excluding pair correlations maximize stability, whereas sequences that maximize pair correlations are less stable, suggesting that pair correlations contribute to another aspect of protein fitness. Consistent with this interpretation, we find that for adenylate kinase, enzyme activity is greatly increased by maximizing pair correlations. The finding that elimination of covariant residue pairs increases protein stability suggests a route to enhance stability of designed proteins; indeed, this strategy produces hyperstable homeodomain and adenylate kinase proteins that retain significant activity. Significance statementRecent methods for protein structure analysis and design have used sequence covariance to help predict protein structure, stability, and function. Here, by designing homeodomain and adenylate kinase sequences with different amounts of single-site bias and pairwise covariance, we find that stability is solely determined by single-site bias but not pairwise covariance. However, pairwise covariance makes an important contribution to catalysis in adenylate kinase. Our findings suggest a new way to generate highly stable proteins: by separating single-site biases from pairwise covariance, the single-site coefficients can be used to design proteins with stabilities even higher than those obtained by consensus design.

8

Expression, purification, and characterization of diacylated Lipo-YcjN from Escherichia coli

Trevino, M. A.; Amankwah, K.; Fernandez, D.; Weston, S.; Stewart, C. J.; Morales Gallardo, J.; Shahgholi, M.; Sharaf, N. G.

2024-09-07 biophysics 10.1101/2024.09.05.611266 medRxiv

Top 0.1%

44.7%

Show abstract

YcjN is a putative substrate-binding protein expressed from a cluster of genes involved in carbohydrate import and metabolism in Escherichia coli. Here, we determine the crystal structure of YcjN to a resolution of 1.95 [A], revealing that its three-dimensional structure is similar to substrate binding proteins in subcluster D-I, which includes the well-characterized maltose binding protein (MBP). Furthermore, we found that recombinant overexpression of YcjN results in the formation of a lipidated form of YcjN that is posttranslationally diacylated at cysteine 21. Comparisons of size-exclusion chromatography profiles and dynamic light scattering measurements of lipidated and non-lipidated YcjN proteins suggest that lipidated YcjN aggregates in solution via its lipid moiety. Additionally, bioinformatic analysis indicates that YcjN-like proteins may exist in both Bacteria and Archaea, potentially in both lipidated and non-lipidated forms. Together, our results provide a better understanding of the aggregation properties of recombinantly expressed bacterial lipoproteins in solution and establish a foundation for future studies that aim to elucidate the role of these proteins in bacterial physiology.

9

Accurate Protein Domain Structure Annotation with DomainMapper

Manriquez-Sandoval, E.; Fried, S. D.

2022-03-20 bioinformatics 10.1101/2022.03.19.484986 medRxiv

Top 0.1%

41.8%

Show abstract

Automated domain annotation plays a number of important roles in structural informatics and typically involves searching query sequences against Hidden Markov Model (HMM) profiles. This process can be ambiguous or inaccurate when proteins contain domains with non-contiguous residue ranges, and especially when insertional domains are hosted within them. Here we present DomainMapper, an algorithm that accurately assigns a unique domain structure annotation to any query sequence, including those with complex topologies. We validate our domain assignments using the AlphaFold database and confirm that non-contiguity is pervasive (6.5% of all domains in yeast and 2.5% in human). Using this resource, we find that certain folds have strong propensities to be non-contiguous or insertional across the Tree of Life, likely underlying evolutionary preferences for domain topology. DomainMapper is freely available and can be run as a single command line function. HIGHLIGHTSDomainMapper generates a unique domain structure annotation, including non-contiguous and insertional domains Automated annotations of non-contiguous domains are validated against the AlphaFold database DomainMapper can be easily installed and used by non-experts Certain folds have strong preferences to be non-contiguous or insertional GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=188 SRC="FIGDIR/small/484986v1_ufig1.gif" ALT="Figure 1"> View larger version (89K): org.highwire.dtl.DTLVardef@1900be8org.highwire.dtl.DTLVardef@1fdae2borg.highwire.dtl.DTLVardef@1b5bd5corg.highwire.dtl.DTLVardef@a31d56_HPS_FORMAT_FIGEXP M_FIG C_FIG

10

Context-Dependent Design of Induced-fit Enzymes using Deep Learning Generates Well Expressed, Thermally Stable and Active Enzymes

Zimmerman, L.; Alon, N.; Levin, I.; Koganitsky, A.; Brestel, C.; Lapidoth, G. D.

2023-07-29 bioinformatics 10.1101/2023.07.27.550799 medRxiv

Top 0.1%

40.6%

Show abstract

The potential of engineered enzymes in practical applications is often constrained by limitations in their expression levels, thermal stability, and the diversity and magnitude of catalytic activities. De-novo enzyme design, though exciting, is challenged by the complex nature of enzymatic catalysis. An alternative promising approach involves expanding the capabilities of existing natural enzymes to enable functionality across new substrates and operational parameters. To this end we introduce CoSaNN (Conformation Sampling using Neural Network), a novel strategy for enzyme design that utilizes advances in deep learning for structure prediction and sequence optimization. By controlling enzyme conformations, we can expand the chemical space beyond the reach of simple mutagenesis. CoSaNN uses a context-dependent approach that accurately generates novel enzyme designs by considering non-linear relationships in both sequence and structure space. Additionally, we have further developed SolvIT, a graph neural network trained to predict protein solubility in E.Coli, as an additional optimization layer for producing highly expressed enzymes. Through this approach, we have engineered novel enzymes exhibiting superior expression levels, with 54% of our designs expressed in E.Coli, and increased thermal stability with more than 30% of our designs having a higher Tm than the template enzyme. Furthermore, our research underscores the transformative potential of AI in protein design, adeptly capturing high order interactions and preserving allosteric mechanisms in extensively modified enzymes. These advancements pave the way for the creation of diverse, functional, and robust enzymes, thereby opening new avenues for targeted biotechnological applications.

11

An amyloidosis-associated polymorphism does not alter LECT2 stability in vitro

Belonogov, L.; Taylor, P. E.; Wong, S.; Morgan, G. J.

2022-03-01 biochemistry 10.1101/2022.03.01.482540 medRxiv

Top 0.1%

40.4%

Show abstract

Amyloid fibrils formed from leukocyte chemotactic factor 2 (LECT2), a secreted human cytokine, are associated with kidney failure in the disease amyloid LECT2 (ALECT2) amyloidosis. This rare disease was recognized in 2008 and has a variable prevalence worldwide. The mechanisms which lead to ALECT2 fibril deposition are not known and there are no treatments other than kidney transplant. The LECT2 gene harbors a single nucleotide polymorphism that leads to either a valine or isoleucine residue at position 40 of the mature protein. Most of the individuals diagnosed with ALECT2 amyloidosis are homozygous for valine at this position, which led us to hypothesize that the valine-containing variant of LECT2 protein is less stable and more prone to aggregation than the isoleucine-containing variant. Here, we investigate the structure, stability and aggregation of both variants of recombinant LECT2. Both variants have similar structures in solution; unfold in similar concentrations of urea; and aggregate at similar rates under native-like conditions, forming structures that bind to thioflavin T. Chelation of the structural zinc ion destabilizes both variants to a similar extent, and increases the rate at which they aggregate. We do not observe a consistent difference in stability or aggregation between the variants of LECT2, so we suggest that the presence of the valine residue at position 40 does not determine whether an individual is at increased risk of ALECT2 amyloidosis.

12

The most probable ancestral sequence reconstruction yields proteins without systematic bias in thermal stability or activity

Theobald, D.; Sennett, M. A.; Beckett, B. C.

2023-02-22 biochemistry 10.1101/2023.02.22.529562 medRxiv

Top 0.1%

40.0%

Show abstract

Ancestral sequence resurrection (ASR) is the inference of extinct biological sequences from extant sequences, the most popular of which are based on probabilistic models of evolution. ASR is becoming a popular method for studying the evolution of enzyme characteristics. The properties of ancestral enzymes are biochemically and biophysically characterized to gain some knowledge regarding the origin of some enzyme property. Current methodology relies on resurrection of the single most probable (SMP) sequence and is systematically biased. Previous theoretical work suggests this will result in a thermostability bias in resurrected SMP sequences, and even the activity, calling into question inferences derived from ancestral protein properties. We experimentally test the potential stability bias hypothesis by resurrecting 40 malate and lactate dehydrogenases. Despite the methodological bias in resurrecting an SMP protein, the measured biophysical and biochemical properties of the SMP protein are not biased in comparison to other, less probable, resurrections. In addition, the SMP protein property seems to be representative of the ancestral probability distribution. Therefore, the conclusions and inferences drawn from the SMP protein are likely not a source of bias. SignificanceAncestral sequence resurrection (ASR) is a powerful tool for: determining how new protein functions evolve; inferring the properties of an environment in which species existed; and protein engineering applications. We demonstrate, using lactate and malate dehydrogenases (L/MDHs), that resurrecting the single most probable sequence (SMP) from a maximum likelihood phylogeny does not result in biased activity and stability relative to sequences sampled from the posterior probability distribution. Previous studies using experimentally measured phenotypes of SMP sequences to make inferences about the environmental conditions and the path of evolution are likely not biased in their conclusions. Serendipitously, we discover ASR is also a valid tool for protein engineering because sampled reconstructions are both highly active and stable.

13

Redesigning OmpA Loops Using Canonical Outer Membrane Protein Loop Structures

Franklin, M. W.; Krise, J.; Stevens, J. J.; Slusky, J. S. G.

2020-10-08 biophysics 10.1101/2020.10.08.331546 medRxiv

Top 0.1%

39.9%

Show abstract

Outer membrane proteins are all beta barrels and these barrels have a variety of well-documented loop conformations. Here we test the effect of three different loop types on outer membrane protein A (OmpA) folding. We designed twelve 5-residue loops and experimentally tested the effect of replacing the long loops of outer membrane protein OmpA with the designed loops. Our studies succeeded in creating the smallest known outer membrane barrel. We find that significant changes in OmpA loops do not have a strong overall effect on OmpA folding. However, when decomposing folding into a fast rate and a slow rate we find that changes in loops strongly affect the slow rate of OmpA folding. Extracellular loop types with higher levels of hydrogen bonds had more instances of increasing the slow folding rate and extracellular loop types with low levels of hydrogen bonds had more instances of decreasing the slow folding rate. Having the slow rate affected by loop composition is consistent with the slow rate being associated with the insertion step of outer membrane protein folding.

14

Protein function prediction in genomes: Critical assessment of coiled-coil predictions based on protein structure data

Simm, D.; Hatje, K.; Waack, S.; Kollmar, M.

2019-06-18 bioinformatics 10.1101/675025 medRxiv

Top 0.1%

39.6%

Show abstract

Coiled-coil regions were among the first protein motifs described structurally and theoretically. The beauty and simplicity of the motif gives hope to detecting coiled-coil regions with reasonable accuracy and precision in any protein sequence. Here, we re-evaluated the most commonly used coiled-coil prediction tools with respect to the most comprehensive reference data set available, the entire Protein Data Base (PDB), down to each amino acid and its secondary structure. Apart from the thirtyfold difference in number of predicted coiled-coils the tools strongly vary in their predictions, across structures and within structures. The evaluation of the false discovery rate and Matthews correlation coefficient, a widely used performance metric for imbalanced data sets, suggests that the tested tools have only limited applicability for large data sets. Coiled-coil predictions strongly impact the functional characterization of proteins, are used for functional genome annotation, and should therefore be supported and validated by additional information.

15

Configurational entropy-based screening and selection of hydrophilic polymers using the tripartite split green fluorescent protein

Banerjee, S.; Minko, Y.; Anaya, E. S.; Sasiene, Z. J.; Schmidt, J. G.; Strauss, C. E. M.; Waldo, G. S.

2022-11-08 biophysics 10.1101/2022.11.07.515508 medRxiv

Top 0.1%

39.5%

Show abstract

Measuring the entropic properties of polymers such as proteins is critical to accurate prediction of their functional properties. However, the measurement of configurational entropy is possible only by low throughput techniques such as calorimetry, NMR and CD spectroscopy. Moreover, to our knowledge no system exists that allows molecular selection/enrichment based on the molecules configurational entropy. We tested the ability of the scalable tripartite GFP system to offer fine resolution of differences in configurational entropy in molecules and to isolate molecules based on their configurational entropy. The system was able to both finely resolve molecules with different configurational entropies, as well as capture them for isolation. We were able to tune the sensitivity of the system by using different mutations of the protein components. Lastly, we were able to apply the system to polypeptoid molecules and posit that the system may be applied to any other hydrophilic polymer of up to 10^3 repeating units.

16

Protein engineering shows antifreeze activity scales with ice-binding site area

Scholl, C. L.; Davies, P. L.

2022-09-07 biochemistry 10.1101/2022.09.07.506985 medRxiv

Top 0.1%

38.9%

Show abstract

The ice-binding site (IBS) of the 9.6-kDa springtail (Collembola) antifreeze protein from Granisotoma rainieri was identified by mutagenesis. We then studied the proteins activity as a function of IBS area. Its polyproline type II helical bundle fold facilitates changes to both IBS length and width. A one third increase in IBS width, through the addition of a single helix doubled antifreeze activity. A one third decrease in area reduced activity to 10%. A construct engineered with an additional tripeptide turn in each helix displayed a 5-fold decrease in activity. Molecular dynamics suggested that the lengthened IBS is more twisted than the wild type, emphasizing the importance of a flat surface for antifreeze activity.

17

MolPhase: An Advanced Phase Separation Predictor and an Investigation of Phytobacterial Effector in Plant

Liang, Q.; Peng, N.; Xie, Y.; Kumar, N.; Gao, W.; Miao, Y.

2023-09-21 biophysics 10.1101/2023.09.21.558813 medRxiv

Top 0.1%

38.9%

Show abstract

We introduce MolPhase (http://molphase.sbs.ntu.edu.sg/), an advanced protein phase separation (PS) prediction algorithm that improves accuracy and reliability by utilizing diverse physicochemical features and extensive experimental datasets. MolPhase applies a user-friendly interface to compare distinct biophysical features side-by-side along protein sequences. By additional comparison with structural predictions, MolPhase enables efficient predictions of new phase-separating proteins and guides hypothesis generation and experimental design. Key contributing factors underlying MolPhase include pi-pi interaction, disorder, and prion-like domain. As an example, MolPhase finds that phytobacterial type III effectors (T3Es) are highly prone to homotypic PS, which was experimentally validated in vitro biochemically and in vivo in plants, mimicking their injection and accumulation in the host during microbial infection. In addition, the phase-separation of T3Es were evolved both in vivo and in vitro, suggesting their determinative scaffolding function, though there is a difference in material properties, implying a difference in homotypic and heterotypic macromolecular condensation. Robust integration of MolPhases effective prediction and experimental validation exhibit the potential to evaluate and explore how biomolecule PS functions in biological systems.

18

Computational Redesign of an Antifreeze Protein Using Deep Learning

Calia, C.; Altunc, A. J.; Eufemio, R. J.; Alvarado, B. O.; Huynh, J. D.; Oh, E.; Burkart, M.; Meister, K.; Paesani, F.

2026-06-24 biophysics 10.64898/2026.06.21.733612 medRxiv

Top 0.1%

38.9%

Show abstract

Antifreeze proteins (AFPs) found in various cold-adapted organisms inhibit ice growth and are of interest for applications in food products, cryopreservation, agriculture, and materials science. Although high-resolution structures are available for several AFPs, the amino acids required for full antifreeze activity remain incompletely defined, and the development of AFP variants with properties such as enhanced solubility, high expression yield, and improved thermostability may further facilitate applications. Here, we used the deep learning model ProteinMPNN to redesign the globular fish antifreeze protein AFPIII, keeping the previously reported ice-binding residues fixed. We readily obtained sequences confidently predicted to adopt AFPIIIs structure and we selected five designed variants for expression, all of which expressed efficiently in E. coli. Circular dichroism spectroscopy showed that two of these variants retained secondary structure elements consistent with AFPIII, whereas the other three exhibited structural differences. One design was predicted and experimentally confirmed to have increased thermostability. All five variants displayed measurable thermal hysteresis activity. However, none reached the activity of wild-type AFPIII, suggesting that maintaining the currently established set of ice-binding residues is not sufficient to fully preserve this AFPs function; other, unidentified residues can also impact its activity. Our findings highlight the value of deep learning-based protein design methods both for generating AFP variants with desirable properties and for uncovering gaps in existing knowledge of well-characterized AFPs.

19

AlphaFold3 and Intrinsically Disordered Proteins: Reliable Monomer Prediction, Unpredictable Multimer Performance

Dao, T. M.; Ghent, S.; Uversky, V. N.; Rahman, T.

2025-12-10 bioinformatics 10.64898/2025.12.05.691730 medRxiv

Top 0.1%

38.0%

Show abstract

AlphaFold3 represents a major advance in protein structure prediction, yet its performance on intrinsically disordered proteins remains uncharacterized. We present the first systematic evaluation of AF3 on disordered systems, revealing a striking dichotomy. For monomers, AF3s pLDDT scores reliably predict disorder (MCC: 0.693), matching AlphaFold2 and rivaling dedicated predictors. This consistency across fundamentally different architectures confirms that disorder prediction emerges from training data, not model design. For multimers, the picture grows complex. Despite comparable aggregate performance (mean DockQ: 0.563 vs 0.571), AF3 and AF2 achieve these results through fundamentally different mechanisms. Conventional structural features explain 58% of AF2s variance but only 42% of AF3s. Users cannot predict when AF3 will succeed or fail from interface properties alone. On disorder-to-order transitions (MFIB benchmark), both models perform equally well, successfully predicting final folded states. Yet seed variance analysis reveals AF3s failures are deterministic: the model converges to identical structures across independent runs, whether correct or incorrect, indicating rigid structural priors override available information. Our findings establish AF3 as reliable for the prediction of monomer disorder but unpredictable for multimers. Architectural innovation alone cannot overcome training data bias. Progress demands disorder-enriched datasets and ensemble sampling, not merely novel architectures.

20

Thermodynamic analysis of GASright dimerization supports a model in which stability is modulated by weak hydrogen bonding and van der Waals packing

Vazquez, G. D.; Cui, Q.; Senes, A.

2022-07-13 biophysics 10.1101/2022.07.11.499632 medRxiv

Top 0.1%

37.7%

Show abstract

The GASright motif, best known as the fold of the glycophorin A transmembrane dimer, is one of the most common dimerization motifs in membrane proteins, characterized by its hallmark GxxxG-like sequence motifs (GxxxG, AxxxG, GxxxS, and similar). Structurally, GASright displays a right-handed crossing angle and short inter-helical distance. Contact between the helical backbones favors the formation of networks of weak hydrogen bonds between C-H carbon donors and carbonyl acceptors on opposing helices (C-H***O=C). To understand the factors that modulate the stability of GASright, we previously presented a computational and experimental structure-based analysis of 26 predicted dimers. We found that the contributions of van der Waals packing and C-H hydrogen bonding to stability, as inferred from the structural models, correlated well with relative dimerization propensities estimated experimentally with the in vivo assay TOXCAT. Here we test this model with a quantitative thermodynamic analysis. We used FRET to determine the free energy of dimerization of a representative subset of 7 of the 26 original TOXCAT dimers using FRET. To overcome the technical issue arising from limited sampling of the dimerization isotherm, we introduced a globally fitting strategy across a set of constructs comprising a wide range of stabilities. This strategy yielded precise thermodynamic data that show strikingly good agreement between the original propensities and {Delta}G{degrees} of association in detergent, suggesting that TOXCAT is a thermodynamically driven process. From the correlation between TOXCAT and thermodynamic stability, the predicted free energy for all the 26 GASright dimers was calculated. These energies correlate with the in silico {Delta}E scores of dimerization that were computed on basis of their predicted structure. These findings corroborate our original model with quantitative thermodynamic evidence, strengthening the hypothesis that van der Waals and C-H hydrogen bond interactions are the key modulators of GASright stability. Secondary AbstractWe present a thermodynamic analysis of the dimerization of the GASright motif, a common dimerization motif in membrane proteins. Previously, we found that the stability of GASright is modulated by van der Waals packing and weak hydrogen bonds between C-H carbon donors and carbonyl acceptors on opposing helices. The experimental dimerization propensities were obtained with an in vivo assay. Here we assess this model quantitatively by measuring the free energy of dimerization of a subset of the original dimers. The thermodynamic data show strikingly good agreement between the original propensities and their {Delta}G{degrees} of association, confirming the model and strengthening the hypothesis that van der Waals and C-H hydrogen bond interactions are the key modulators of GASright stability.